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ABSTRACT 

This work describes the application of a control theory-based aero- 
dynamic shape optimization method to the problem of supersonic 
aircraft design. The design process is greatly accelerated through 
the use of both control theory and a parallel implementation on dis- 
tributed memory computers. Control theory is employed to derive 
the adjoint differential equations whose solution allows for the evalu- 
ation of design gradient information at a fraction of the computational 
cost required by previous design methods [13, 12, 44, 38). The re- 
sulting problem is then implemented on parallel distributed memory 
architectures using a domain decomposition approach, an optimized 
communication schedule, and the MPI (Message Passing Interface) 
Standard for portability and efficiency. The final result achieves very 
rapid aerodynamic design based on higher order computational fluid 
dynamics methods (CFD). 

In our earlier studies, the serial implementation of this design 
method [19, 20, 21, 23, 39. 25, 40, 41, 42, 43, 9] was shown to be 
effective for the optimization of airfoils, wings, wing-bodies, and 
complex aircraft configurations using both the potential equation 
and the Euler equations[39, 25). In our most recent paper, the Euler 
method was extended to treat complete aircraft configurations via a 
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new multiblock implementation. Furthermore, during the same con- 
ference, we also presen ted prelim in ary resultsdemonstrating that this 
basic methodology could be ported to distributed memory parallel 
computing architectures [24]. In this paper, our concern will be to 
demonstrate that the combined power of these new technologies can 
be used routinely in an industrial design environment by applying it 
to the case study of the design of typical supiersonic transport con- 
figurations. A particular difficulty of this test case is posed by the 
propulsion/airframe integration. 

INTRODUCTION 

To realize the potential of CFD to produce superior designs, there 
is a need not only for accurate aerodynamic prediction algorithms, 
but also for design methods capable of creating new optimum con- 
figurations. Yet, while flow analysis has matured to the extent that 
Navier-Stokes calculations are routinely carried out over very com- 
plex configurations, CFD-based design techniques are just beginning 
to treat moderately complex three-dimensional configurations. 

Existing CFD analysis methods have previously been used to treat 
the design problem by coupling them with numerical optimization 
methods (13, 12, 44, 38], The essence of these methods, which 
incur heavy computational expense, is very simple: a numerical 
optimization procedure is used to extremize a chosen aerodynamic 
figure of merit which is evaluated by the given CFD code. The con- 
figuration is systematically modified through user specified design 
variables. Most of these optimization procedures require the evalu- 
ation of the gradient of the cost function with respect to the specified 
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design variables. The simplest of the methods to obtain these neces- 
sary gradients is the finite difference method. In this technique, the 
gradient components are obtained by independently perturbing each 
design variable with a finite step, calculating the corresponding value 
of the objective function using CFD analysis, and forming the ratio 
of the differences. The gradient is used by the numerical optimiza- 
tion algorithm to calculate a search direction using steejjest descent, 
conjugate gradient, or quasi-Newton techniques. After finding the 
minimum or maximum of the objective function along the search 
direction, the entire process is repeated until the gradient approaches 
zero and further improvement is not possible. 

The finite difference-based optimization strategy is computation- 
ally expensive because the flow must be repeatedly calculated for 
perturbations in every design variable. Nevertheless, it is attractive 
when compared with other traditional design strategies such as in- 
verse methods, since it permits any choice of the aerodynamic figure 
of merit. The use of numerical optimization for transonic aerody- 
namic shape design was pioneered by Hicks, Murman and Vander- 
plaats [13]. They applied the method to two-dimensional profile 
design governed by the p>otential flow equation. The method was 
quickly extended to wing design by Hicks and Henne [12]. Later, 
in the work of Reuther, Cliff, Hicks and Van Dam, this method was 
successfully used for the design of sup>ersonic wing-body transport 
configurations [38]. Howeverallof these cases, which were confined 
to finite difference gradients on serial computer architectures, were 
limited in their geometric complexity simply due to computational 
expense. For example, the designs presented in [38] were limited to 
wing-body configurations. Yet it is well known that optimum perfor- 
mance (especially for supersonic configurations) will require highly 
tuned nacelle/airframe integrations. It was not possible to include 
nacelle/airframe considerations into the design problem outlined in 
[38] since the required number of mesh points, which more than 
doubles with the inclusion of nacelles, could not be afforded. 

In the last few years, alternative methods for obtaining design 
sensitivities have been developed which greatly reduce the compu- 
tational cost of optimization. References [1, 2, 3, 5, 4, 7, 8, 6, 32, 
29, 16, 35, 30, 28, 34, 47, 31, 36, 37, 49, 15, 33, 14] present a 
partial list of recent works in this developing area of research. An 
exhaustive report on the various approaches to the problem and their 
advantages and disadvantages is given by the first author in [45). 
The most promising of these approaches is the adjoint formulation 
whereby the sensitivity of some objective function with respect to an 
arbitrary number of design variables is obtained with the equivalent 
of only one additional flow calculation. Here, the solution of the 
adjoint system (using the same mature techniques as perfected for 
the flow equations) enables each gradient element to be calculated 
very cheaply, meaning the number of design variables is essentially 
eliminated as a constraining factor. Moreover, the adjoint solu- 
tion (and to a lesser extent the accompanying flow solution) need 
not be highly converged to be useful, in significant contrast to the 
highly-converged flow solutions which are crucial to accurate finite 
difference gradients [45]. 

In spite of the large decrease in computational cost provided by 
the adjoint formulation of the design problem, the aerodynamic op- 
timization of a complete configuration still remains a formidable 
computational task. The advent of reliable and efficient parallel 
computers using distributed memory is a key enabling technology 
to decrease the turnaround time of these design calculations to the 


point where configurations can be optimized almost in real time. 

The work presented in this paf>er combines these two ingredi- 
ents (adjoint formulations and parallel implementations) to produce 
a robust, accurate, and efficient method that can be used for the de- 
sign of supersonic aircraft including the effects of airframe/nacelle 
integration. 


FORMULATION OF THE ADJOINT EQUATIONS 


The aerodynamic properties which define the cost function / are 
functions of the flow field variables, lu, and the physical location of 
the boundary, which may be represented by the function T . That is, 

/ = l{w,T) 


and a change in T results in a change 


dV 


dr 




( 1 ) 


in the cost function. The governing equation R and its first variation 
express the interdependence of w and T within the flow field domain 
D: 


R(w,D = 0, + 

Introducing a Lagrange multiplier we have 


dR 


dT 


8T 


( 2 ) 
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Choosing to satisfy the adjoint equation 
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the first term is eliminated, and we find that the desired 
given by 
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dr 




dR 

dT 


gradient 


(4) 


is only a function of T. Since (4) is independent of 8w, the gradient 
of / with respect to an arbitrary number of design variables can be 
determined without the need for additional flow field evaluations. 
The main cost is in solving the adjoint equation (3). In general, the 
adjoint problem is about as complex as a flow solution. Therefore, 
when the numberof design variables is larger than 2, it becomes com- 
pelling to take advantage of the cost differential between one adjoint 
solution and the large number of flow field evaluations required to 
determine the gradient by finite differences. Once equation (4) is 
obtained, Q can be provided to a variety of numerical optimization 
algorithms to obtain an improved design. 


MULTIBLOCK FLOW SOLUTION 

In order to extend the methods presented in our earlier three- 
dimensional work to the treatment of complete aircraft configura- 
tions, the single-block flow solver used in [21, 40, 42] must be 
replaced. As with the single-block solver, the more general flow 
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solver must meet fundamental requirements of accuracy^ efficiency, 
and robust convergence to be employed in an automated design 
environment. High accuracy is required since the predicted im- 
provements in the design realized by the method can only be as good 
as the accuracy of the flow analysis. Efficiency of the flow solver 
is also critical since the optimization of the design will generally 
require the computation of many flow solutions or other solutions 
of comparable complexity. Finally, robust convergence is also of 
significant importance since the main benefit of aerodynamic opti- 
mization is in obtaining the last few percentage points in improved 
efficiency. The solutions must be converged well enough that the 
noise in the figure of merit, say drag at a fixed lift, is well below 
the level of realizable improvement. The desirable ability to com- 
pare adjoint-based gradients with finite differencing as a check also 
requires highly-converged flow solutions. 

In our three-dimensional single-block applications, the FL087 
code written by the fourth author readily met all of the above crite- 
ria. FL087 achieves fast convergence with the aid of multigridding 
and residual smoothing. It is normally easy to obtain solutions that 
converge to machine accuracy. The challenge in the present work 
was to meet these strict requirements within the framework of a 
multiblock flow solver. The use of a multiblock approach is a first 
step towards the treatment of more complex configurations. How- 
ever, the multiblock strategy presented here is not the only viable 
approach. Other alternatives such as unstructured mesh solvers are 
also currently under investigation. 

The general strategy in developing the multiblock flow solver is 
to construct and update a halo of cells around each block such that 
the flow solution inside each block is transparent to the block bound- 
aries. This task requires establishing the size and location of halo 
cells adjacent to block boundaries and loading the halo cell values 
with appropriate flow field data at the appropriate times. To accom- 
plish this task, a two-level halo is constructed around each block. 
The requirement of this double halo results from the necessity of 
preserving a complete stencil of calculated fluxes entering and leav- 
ing each cell in the entire domain without regard to block boundaries. 
This ensures that the conservative flow solution algorithm is fully 
maintained. Since both the convective and the dissipative fluxes 
are calculated at the cell faces (boundaries of the control volumes), 
all six neighboring cells are necessary, thus requiring the existence 
of a single level halo for each block in the multiblock calculation. 
The dissipative fluxes are composed of a blend of first and third or- 
der differences corresponding to terms that mimic second and fourth 
derivatives of the flow quantities [26]. For the third order differences 
at the boundary faces of each cell for all blocks, the presence of the 
twelve neighboring cells (two adjacent to each face) is required. For 
each cell within a block, Figure 1 shows the neighboring cells that are 
required for the calculation of convective and dissipative fluxes. For 
each block, some of these cells will lie directly next to an interblock 
boundary, in which case, the values of the flow variables residing in 
a different block will be necessary to calculate the convective and 
dissipative fluxes. Halo cells on the external boundary of the entire 
computational domain are constructed and updated by extrapolation 
and reflection, depending on the kind of boundary condition ap- 
plied. Once the halo configuration is set up for each block, standard 
methods for spatial discretization and time integration (including ar- 
tificial dissipation, implicit residual averaging, and multigridding) 
are employed to compute the flow solution within each individual 



Figure 1 : Convective and Dissipative Discretization 
Stencils. 


block. 

The strategy for a complete flow solution proceeds as follows: 
First, the blocks that comprise the flow field mesh are read from an 
external file. Then, the double halo configuration is established, for 
each individual block, by inserting into halo cell locations values for 
grid metrics, etc., taken from the interior cells of adjacent blocks. 
For the coarse grids required in the multigrid procedure, the process 
is repeated with coarse grid halo cells defined by the internal cells 
of adjacent coarse grid blocks. For block faces that lie on solid, 
symmetry, or far field boundaries, standard single-block techniques 
are used to define the halo cells. As an example, consider the simple 
4-block grid depicted in Figure 3. The halo cells for block I will be 
obtained from the internal cells of blocks II, III, and IV, and from 
solid or far field boundary techniques for the faces not adjacent to 
other blocks. Coarse grids are computed in the usual fashion, by 
aggregating groups of eight cells and then repeating the above halo 
cell process. Once the halo configuration is complete for the fine and 
all coarse grids, the flow solution commences. 

The system of equations solved here as well as the solution strategy 
follows that presented in many earlier works [26, 18, 17]. The three- 
dimensional Euler equations may be written as 


dw dft ^ ^ 


(5) 


where it is convenient to denote the Cartesian coordinates and ve- 
locity components by xi, X 2 , 2:3 and U], U 2 , U 3 , and w and /, are 
defined as 
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where is the Kronecker delta. Also, 

p = (7) 


3 


and 


pH=pE + p, ( 8 ) 

where 7 is Ihe ratio of the specific heats. Consider a transformation 
to coordinates fi, ^ 2 . 6 where 

Introduce scaled contravahant velocity components as 


2L 

dxj 


= QtjUj 

where 

Q = JK~\ 

The Euler equations can now be written as 


with 
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For the multiblock case, the above notation applies to each block 
in turn. The flow is thus determined as the steady^state solution to 
equation (9) in all blocks subject to the flow tangency conditions on 
all solid boundary faces of all blocks: 


Un=0 onallBs (11) 

where is 1, 2, or 3 depending on the direction that is normal 
to the face Bs where a solid surface is assumed to exist. At the 
far field boundary faces. Bp, ffeestream conditions are specified 
for incoming waves, while outgoing waves are determined by the 
solution. 

The time integration scheme follows that used in the single^block 
strategy (26]. The solution proceeds by performing the cell flux 
balance, updating the flow variables, and smoothing the residuals, at 
each stage of the time stepping scheme and each level of the multigrid 
cycle. The main difference in the integration strategy is the need to 
loop over all blocks during each stage of the integration process. The 
use of the double-halo configuration permits standard single-block 
subroutines to be used, without modification, for the computation of 
the flow field within each individual block. This includes the single- 
block subroutines for convective and dissipative flux discretization, 
multistage time stepping, and multigrid convergence acceleration. 

The only difference between the integration strategies is in the 
implementation of the implicit residual averaging technique. In the 
single-block solution strategy, a tridiagonal system of equations is 
set up and solved using flow information from the entire grid. Thus, 
each residual is replaced by an average of itself and the residuals of 
the entire grid. In the multiblock strategy, the support for the implicit 
residual smoothing is reduced to the extent of each block, in order 


to eliminate the need to solve large tridiagonal systems spanning the 
blocks, which would incur a penalty in communication costs and 
may not even be defined. This change has no effect on the final 
converged solution, and in the present application has not led to any 
significant reduction in the rate of convergence. 

THE ADJOINT FORMULATION FOR THE EULER EQUA- 
TIONS 

The application of control theory to aerodynamic design problems 
is illustrated by treating the case of three-dimensional design, using 
the Euler equations discussed above as the mathematical model for 
compressible flow. In our previous work, the illustrative problem 
most often used specified the cost function as a measure of the differ- 
ence between the current and some desired pressure distribution. In 
the case of transonic flows over conventional commercial transport 
wings this aerodynamic figure of merit proves to be very effective 
since the tailoring of these pressure distributions to achieve close to 
optimum performance is well understood by most aeiodynamicists. 
However, for the case of supersonic design of three-dimensional 
configurations, the specification of pressure distributions that will 
determine near optimum performance is a considerably more chal- 
lenging problem. Thus the development here will focus on the more 
salient problem for supersonic design: drag minimization at a fixed 
lift. 


I — Cd 

= Ca cos nr -f Cn sin a 

= 3 / / Cp (5x cos nr + sin or) 

^ref J Jbs 

where Sx and Sy are projected surface areas, is the reference 
area, and d(i and are the two coordinate indices that define the 
plane of the face in question. Note that the integral in the final 
expression above is carried out over all solid boundary faces. The 
design problem is now treated as a control problem where the control 
function is the geometry shape, which is to be chosen to minimize 
/ , subject to the constraints defined by the flow equations (5—10). A 
variation in the shape will cause a variation 8p in the pressure and 
consequently a variation in the cost function 


61 = 
4- 

4 


01 ^ A COS a -f- oc/v sin a 
{ —Ca sin or 4 cos a } 


/ ac. 

I da 


■ cos a 4 


dCr, 

da 


1 Qf|‘ 


Sa 


where 6Ca and SCn are variations due to changes in the design 
parameters with o fixed. To treat the interesting problem of practical 
design, drag must be minimized at a fixed lift coefficient. Thus an 
additional constraint is given by 


6Cl = 0 , 


which gives 


4 
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Combining these two expressions to eliminate gives 

51 = cos ft -f 6C/ST sin fv 

{ SCfsf cos ft - 5Ca sin ft } , 


where H is given by 




(-C,+ ^cos» + ^s.ng) 
(C’o + ^cosa - ^sin«) ' 


( 12 ) 


Suppose now that ^ is the steady-state solution of the adjoint 
equation 


drl) X dtl> 

Ji dT. 


0 in D. 


(17) 


At internal block boundaries, the face integrals cancel from the ad- 
jacent blocks. At the far field the choice of the adjoint boundary 
conditions depends on whether the flow is subsonic or supersonic. 
For subsonic flow, so long as the outer domain is very far from the 
configuration of interest, we may set 


=0 on all Bf- 


Since p depends on w through the equation of state (7-8), the varia- 
tion 5p can be determined from the variation 5w. If a fixed computa- 
tional domain is used, the variations in the shape result in variations 
in the mapping derivatives. Define the Jacobian matrices 

>1. = 1^, C, - Q„Aj. (13) 

Then the equation for 5 w in the steady state becomes 

(«F.) = 0, 

where 

SF,=C,8w + S{Q.j)fj. 

Now, multiplying by a vector co-state variable 0, assuming the result 
is differentiable, and integrating by parts over the entire domain. 


/ J (14) 


where n, are components of a unit vector normal to the boundary. 
The variation in the cost function can also be expressed in terms of 
6p after (12) and (14) are summed to give, 
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On the solid surfaces Bst m = «2 = 0. It follows from equation 
(11) that 
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If, however, the flow is subsonic and the boundary is fairly close, 
then far field faces may be set by V^i-5 = 0 for incoming waves, 
while outgoing waves are determined by the solution. It is noted 
that the waves in the adjoint problem propagate in the opposite 
direction to those in the flow problem due to the transpose in equation 
(17). For supersonic flows, the choice of boundary conditions at the 
outer domain can be developed from physical intuition as well as 
mathematical analysis. For a given geometry, say a wing, a change 
in the surface at any particular point, Vy will incur changes in the 
flow field and hence the performance in the region defined by the 
Mach cone originating at V. Similarly, it is possible to determine 
the region over which surface changes affect the flow condition at a 
given point. This region would also form a cone that would point 
exactly in the opposite direction of the Mach cone, depending on 
local conditions. It is the solution of this reverse problem that the 
adjoint represents. The contribution to, say, drag at a given point is 
influenced by changes to the surface at all points within the reverse 
cone. The correct supersonic far field boundary conditions for the 
adjoint equation that are consistent with this reversed character are: 


Vn -5 = 0 at the exit; 

01-5 extrapolated from the interior at the inflow boundary. 

Then if the coordinate transformation is such that 6 (7 A'~^) is neg- 
ligible in the far field, the last integral in (15) reduces to 





Thus by letting V' satisfy the boundary conditions, 


(V^ 2 Qt 71 -f ^nQn 2 + V»4Qt73) = Q on all Bsy 
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where 


^ “ I ,i2 c — i ") 
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we find after integrating by parts again that 
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which is independent of tv. 
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MULTIBLOCK MESH VARIATIONS AND DESIGN VARI- 
ABLES 

In order to construct 61 in equation (20), the variation in the metric 
terms must be obtained in each block. One way to accomplish this is 
to use finite differences to calculate the necessary information. This 
approach avoids the use of multiple flow solutions to determine the 
gradient, but it unfortunately still requires the mesh generator to be 
used rep>eatedly. The number of mesh generations required is propor- 
tional to the number of design variables. The inherent difficulty in 
the approach is two-fold. First, for complicated three-dimensional 
configurations, elliptic or hyperbolic partial differential equations 
must often be solved iteratively in order to obtain acceptably smooth 
meshes. These iterative mesh generation procedures are often com- 
putationally expensive. In the worst case they approach the cost of 
the flow solution process. Thus the use of finite difference meth- 
ods for obtaining metric variations in combination with an iterative 
mesh generator leads to computational costs which strongly hinge on 
the number of design variables, despite the use of an adjoint solver 
to eliminate the flow variable variations. Second, multiblock mesh 
generation is by no means a trivial task. In fact no method currently 
exists that allows this to be accomplished as a completely automatic 
process for complex three-dimensional configurations. 

In our earlier works [40, 39, 25, 19, 20, 21), two methods have 
been explored which avoid these difficulties. In the first method, 
a completely analytic mapping procedure was used for the mesh 
generation. This technique is not only fully automatic and results in 
smooth consistent meshes, but it also allows for complete elimination 
of finite difference information for the mesh metric terms. Since 
the mapping function fully determines the entire mesh based on the 
surface shape, this analytic relationship may be directly differentiated 
in order to obtain the required information without considering a 
finite step. An analytic mapping method requires the geometry 
topology to be built directly into the formulation, and only works for 
simple configurations. Nevertheless, within these limitations it has 
proven to be highly effective [19, 20, 211. 

The second method that we have explored is the use of an analytic 
mesh perturbation technique. In this approach, a high quality mesh 
appropriate for the flow solver is first generated by any available 
procedure prior to the start of the design. In examples to be shown 
later, these meshes were created using the Gridgen software devel- 
oped by Pointwise Inc.[46|. This initial mesh becomes the basis for 
all subsequent meshes which are developed by analytical perturba- 
tions. In the method that was previously developed for wing-body 
configurations it had been assumed that only one surface, say the 
wing, was perturbed during a design case. This permitted the use of 
a very simple algebraic mesh perturbation algorithm. New meshes 
are created by moving all the mesh points on an index line projecting 
from the surface by an amount which is attenuated as the arc length 
from the surface increases. If the outer boundary of the grid domain 
is held constant the modification to the grid has the form 

^ ^oid ^ ^oid ^ 21 ) 

where X, represents the volume grid points, x^, represents the surface 
grid points and 5 represents the arc length along the radial mesh line 
measured from the outer domain, normalized so that 5 = 1 at the 
inner surface. Unfortunately this simple logic breaks down in the 
case where multiple faces sharing common edges are allowed to 


move. Thus in order to use analytic mesh perturbations for the 
treatment of the more general problem where multiple faces of a 
given block may be simultaneously deformed, equation (21) had to 
be modified in a way that resembles transfinite interpolation (TFI) 
[48]. Unlike TFI, where there is no prior knowledge of the interior 
mesh, the perturbation algorithm developied here (WARP3D) does 
make use of the relative interior point distributions in the initial mesh. 

The WARP3D algorithm has been modified from that presented 
in reference [43] and is now a three stage procedure. The first 
stage shifts the internal mesh points to produce an interim block that 
is determined entirely by the new locations of the 8 comer points 
defining the block. Corresponding to the motion of each comer point, 
each interior point is shifted by a displacement proportional to one 
minus the normalized distance along the index lines away from that 
comer point. The second stage corrects the perturbations resulting 
from the first stage by determining the distance each of the 12 edges 
of the stage 1 block needs to be moved to attain the desired edge 
locations. These perturbations are then also incorporated into the 
volume mesh points through a weighting scheme that is proportional 
to the relationship of an individual edge jxiint motion and the volume 
point in question. Finally with both comer and edge point motion 
accounted for, the third stage checks the perturbation of each point 
in all six faces relative to the position of the corresponding point 
in the stage 2 block. If the perturbation of the domain involves 
a simple translation of all boundary points, the relative changes 
from stages 2 and 3 will be zero and all the perturbation will be 
accomplished by stage 1. If, however, face points are perturbed 
relative to the reference block, then these changes are propagated 
to the interior points through relative arc length -based perturbations 
along projecting index lines. In general all 3 stages are required. 
The idea of WARP3D is to use an initial mesh with good quality 
attributes as a starting point, and then systematically perturb this 
mesh in a manner such that the original grid quality is maintained, 
without the need for expensive elliptic smoothing. 

Since our current flow solver and design algorithm assume a point- 
to-point match between blocks, each block may be independently 
perturbed by WARP3D, provided that perturbed surfaces are treated 
continuously across block boundaries. The entire method of creating 
a new mesh is given by the following algorithm. 

1 . All faces that are directly affected by the design variables (active 
faces) are explicitly perturbed. 

2. All edges that touch an active face, either in the same block or 
in an adjacent block, are implicitly perturbed by (21). 

3. All inactive faces that either include an implicitly pierturbed 
edge or abut to an active face are implicitly perturbed by a 
quasi-3D form of WARP3D. 

4. WARP3D is used on each block that has one or more explicitly 
or implicitly perturbed faces to determine the adjusted interior 
points. 

Note that much of the mesh, especially away from the surfaces, will 
not require mesh perturbations and thus may remain fixed through 
the entire design process. Close to the surfaces, many blocks will 
either contain an active face or touch a block which contams an 
active face, either by an edge or by a comer. As the design variations 
affect the active faces, the above scheme ensures that the entire mesh 
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will remain attached along block boundaries. Added complexity is 
needed to accomplish step (2) since the connectivity of the various 
edges and comers must be indicated somehow. Currently, a list of 
master edges and master comers is determined as a preprocessing 
step. During the design calculation, these lists are updated and 
transferred to all connected edges and comers as the mesh is moved. 

Since this mesh perturbation algorithm is analytic it is possible 
to work out the analytical variations in the metric terms required 
for equation (20). This approach was followed in reference [40]. 
However since the mesh perturbation algorithm that was used in 
the current paper was significantly more complex, and it was dis- 
covered that the computational cost of repeatedly using the block 
perturbation algorithm was minimal, finite differences were used to 
calculate 8Qtj instead of deriving the exact analytical relationships. 
Even in cases with hundreds of design variables, the computational 
cost of repeatedly re-evaluating SQ^j for all necessary blocks is 
still insignificant compared with the cost of a single flow solution. 
The conclusion is that the analytical mesh perturbation algorithm, 
WARP3D, unlike an elliptical mesh generation method, is efficient 
to the extent that the cost of remeshing can be neglected. 

It remains to choose a set of design variables which smoothly 
modifies the original shape, say 6,. The gradient can then be defined 
with respect to these design variables as 

«(».)= f. (22) 

where 6/ is calculated by (20) and each term 6* is independently 
perturbed by a finite step. Therefore, to construct Q, a basis space of 
independent perturbation functions 6,, i = 1 , 2, . . . , n (n = number 
of design variables) must be chosen to allow for the needed freedom 
of the design space. In this work, design variables were chosen as a 
set of Hicks-Henne functions simply for their ease of implementation 
and their proven reliability. The form of the Hicks-Henne functions 
which were initially proposed in Reference [12] is given by: 

I , 0 < I < 1 (23) 

Here ri locates the maximum of the bump in the range 0 < x < 
1 at X = ri, since the maximum occurs when x^ = j, where 
a = log ^ / log < 1 , or o log = log The parameter i 2 controls the 
width of the bump. To generalize the application of the Hicks-Henne 
functions, which have traditionally been used for the modification of 
airfoil sections where the x in equation (23) refers to the chordwise 
position, they are applied directly to a parameterized (u, li) surface 
which may be composed of one or more faces in different blocks. 
The parameterization may be accomplished in many ways. For this 
study, the wing is designed by projecting all surface points to be 
perturbed onto a plane and normalizing by the planform outline. 
Thus the Hicks-Henne shape functions may be applied as functions 
in either the u, D, or both directions. The design code is further 
structured so that these variables may be applied to any subset of 
the parametric surface. Alternatives are provided such that these 
variables may be linearly lofted in the second direction as opposed 
to Hicks-Henne functions in both directions. All of these options 
may be prescribed at the input level, leading to a highly versatile 
design code in which one or more faces in the multiblock domain 
may be perturbed by the design variables. To enforce geometric 
constraints, each design variable may be activated on more than one 


face. For example, if the thickness of a wing is to be preserved and 
the upper and lower halves of the wing are in separate blocks, then 
the design variables need to be applied at the proper locations with 
the proper weights and on the appropriate faces in both blocks such 
that thickness does not change while both surfaces are allowed to be 
modified. 

DOMAIN DECOMPOSITION AND PARALLEL IMPLEMEN- 
TATION 

The main strategies that are used to accomplish the parallelization of 
the design code are: a domain decomposition model, a SPMD (Single 
Program Multiple Data) strategy, and the MPI (Message Passing 
Interface) Library for message passing. The choice of message 
passing library was determined by the requirement that the resulting 
code be portable to different parallel computing platforms as well as 
to homogeneous and heterogeneous networks of workstations. 

As one can see from the previous sections, obtaining the desired 
parallelization by domain decomposition entails the treatment of four 
separate parts: the solution of the flow equations, the solution of the 
adjoint equations, the calculation of the mesh perturbations, and the 
calculation of the gradient integral formulas. No attempt is made 
to parallelize the quasi-Newton optimization algorithm. It is thus 
assumed in this construct that the determination of the step sizes and 
search directions provided by the optimization algorithm is com- 
putationally insignificant when compared with the other elements 
necessary during the design. 

Since the flow and adjoint equations are to be solved using ex- 
actly the same efficient numerical techniques, the same paralleliza- 
tion techniques used for the flow equations apply to the solution of 
the adjoint equations. Therefore, all details of the parallel imple- 
mentation corresponding to these first two parts of the program will 
be explained with reference only to the flow equations. Further- 
more, since the mesh perturbation algorithm WARPS D also works 
on a block-by-block basis, the communication necessary to main- 
tain mesh consistency can also be addressed by the same domain 
decomposition strategies that are used for the state and costate fields. 

The subdomains of the flow solution resident on each processor are 
divided along the block boundaries such that one or more complete 
blocks are allocated to each processor. This has the natural conse- 
quence that the communication between subdomains is performed 
through the same halo cells that were described earlier for the multi- 
block computations, only now, the interior cells corresponding to 
given halo cells might reside in a different processor. An alternative 
to this would be to partition the complete problem along the three 
coordinate directions for each of the blocks in the mesh. Since the 
sizes of the blocks can be quite small, this further partitioning would 
severely limit the number of multigrid levels that could be used in 
the flow and adjoint solutions. The underlying assumption is the fact 
that there always will be more blocks in the mesh than processors 
available. If this is the case, every processor in the domain would be 
responsible for the computations inside one or more blocks. In the 
case in which there are more processors than blocks available, the 
blocks can be adequately split during a pre-processing step in order 
to have at least as many blocks as processors. This approach has 
the advantage that the number of multigrid levels that can be used in 
the parallel implementation of the code is always the same as in the 
serial version, and is only limited by the size of the smallest block in 
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the mesh. 

One advantage of the assignment of complete blocks to a given 
processor is that the number of processors in the calculation can be 
an arbitrary integer. The drawback of this approach is the loss of the 
exact load balancing that can be achieved by coordinate direction 
partitioning. All blocks in the calculation can have different sizes, 
and consequently, it is very likely that different processors will be 
assigned a different total number of cells in the calculation. This, in 
turn, will imply that some of the processors will be waiting until the 
processor with the largest number of cells has completed its work 
with the result being that the parallel performance will suffer. The 
approach that we have followed to solve the load balancing problem 
is to assign to each processor, in a pre-processing step, a certain 
number of blocks such that the total number of cells in each processor 
is as close as possible to the exact share for perfect load balancing. 
In practice, this approach yields quite good load balancing [24]. One 
must note that load balancing based on the total number of cells in 
each processor is only an approximation to the optimal solution of 
the problem. Other variables such as the number of blocks, the block 
locations, the size of each block, and the size of the buffers to be 
communicated play an important role in proper load balancing, and 
are the subject of current study. 

Now, within each processor, there will be several blocks that need 
to communicate with their neighboring blocks. The data for these 
neighboring blocks can reside in a different processor, and therefore, 
communication is necessary. In order to minimize communication 
cost, it was decided to pack all data that needed to be communicated 
from one processor to another in one single message, regardless of 
the number of blocks that resided in each of the processors. Within 
each processor, the data for the flow variables, adjoint variables, and 
grid locations are stored on a large one-dimensional array. In order 
to accomplish this type of communication, during the pre-processing 
step, each processor compiles a pointer list with all the entries in these 
large arrays that need to be sent to all other processors. Similarly, 
another pointer list for the locations of the data to be received is 
also setup. At the time of communication exchanges each processor 
communicates all the information for the blocks that it contains to 
those processors that need to receive it. The communication is 
implemented using asynchronous (non-blocking) send and receive 
MPI constructs. 

The final formulas for the gradient of the cost function include a 
number of volume and surface integrals (see [22]) that can also be 
calculated in parallel- The nature of the integrations is such that, 
from a domain decomposition point of view, the problem is nearly 
embarrassingly parallel, since the partial values of the integrals in 
each domain can be calculated without the need for any communi- 
cation at all. Once the individual processors have completed their 
calculation, a single global MPI reduce operation can combine the 
results into the total gradients, which can then be passed to the nu- 
merical optimization algorithm to develop the shape change at each 
design iteration. 


COMPLETE MULTIBLOCK DESIGN ALGORITHM (SYN87- 
MB 

With all the necessary components defined for the multiblock adjoint- 
based design, it is now possible to outline the complete procedure: 


1 . Decompose the multiblock mesh into an appropriate number of 
processors, and create lists of pointers for the communication 
of the processor halo cells. 

2. Solve the flow field governing equations (5-10). 

3. Solve the adjoint equations (17) subject to the boundary condi- 
tion (19). 

4. For each of the n design variables repeat the following: 

• Perturb the design variable by a finite step according to 
(23, etc.). 

• Explicitly p>erturb all faces affected by the design variable. 

• Implicitly perturb all faces that share an edge with an 
explicitly perturbed design variable. 

• Obtain the new internal mesh point locations via 
WARP3D for those blocks with perturbed faces. 

• Calculate all the delta metric terms, SQt j, within those 
blocks that were perturbed by finite differencing. 

• Integrate equation (20) to obtain 61 for those blocks that 
contain nonzero 

• Determine the gradient component by equation (22). 

5. Calculate the search direction and perform a line search. 

6. Return to (2) if minimum has not been reached. 

The basic method here builds on that used in reference [40] with the 
proper extensions to treat multiblock domains. In order to imple- 
ment the method, equation (17) and boundary condition (19) must 
be discretized on the multiblock domain. In the current implemen- 
tation, a cell centered, central difference stencil that mimics the flux 
balancing used for the flow solution is used. Since this choice of 
discretization differs from the one obtained if the discrete flow equa- 
tion Jacobian matrix were actually transposed to form the adjoint 
system, the gradients obtained by the present method will not be 
exactly equal to the gradients calculated by finite differencing the 
discrete flow solutions. However, as the mesh is refined these differ- 
ences should vanish. Continuing, the adjoint system so discretized 
is solved on the multiblock domain in a fashion identical to that 
used for the flow solution. Therefore, the adjoint solver, like the 
flow solver, uses an explicit multistage Runge-Kutta-like algorithm 
accelerated by residual smoothing and multigridding. Intra-block 
communication is again handled through a double halo which allows 
for the full transfer of information across boundaries except for the 
stencil of support for the implicit residual smoothing. 

Step (3) in the above procedure is the portion of the method that 
is still treated by finite differences. Fortunately, all of these steps 
incur only a trivial computational cost compared with even a single 
flow analysis time step. It is therefore possible, without significant 
penalty, to leave this in finite difference form even for cases where 
many hundreds of design variables are used. 

The present implementation uses the quasi-Newton algorithm, 
QNMDIF, developed by Gill, Murray and Pitfield [10] and en- 
hanced by Kennelly [27], to calculate the search direction. It is an 
unconstrained optimization algorithm that uses Broyden-Fletcher- 
Goldfarb-Shanno (BFGS) updates to the Cholesky factored Hessian 
matrix. A complete treatment of the quasi-Newton and other opti- 
mization strategies is given by Gill, Murray and Wright [11]. 


8 



PARALLEL EFFICIENCY 


MuMNock Design Cod* > Paratol Sp*«dup Curva 


Forproblems with alow task granularity (ratio of the number of bytes 
received by a processor to the number of floating point operations 
it performs), large parallel efflciencies can be obtained. Unfortu- 
nately, convergence acceleration techniques developed in the 1980s 
base their success on global communication in the computational 
domain. Thus, current multigrid and implicit residual smoothing 
techniques could pwssibly hinder parallel performance in traditional 
mesh sizes. However, for larger meshes used in viscous flows and 
complex geometries, the granularity becomes low enough, and the 
parallel performance is quite high. 

A careful implementation of the communication schedules can 
yield a design method with excellent parallel performance. The par- 
allel efficiency of the design method is presented in Figure 2 for the 
case of the automated redesignof a typical business jet configuration. 
In the modeling of the aircraft, a wing -body-nacelle topology was 
used containing 72 blocks of varying sizes with a total of 750,000 
cells. It must be noted that the parallel speed-ups below are com- 
puted using true execution wall-times. The execution time of the 
one processor job is taken as that of the most efficient serial im- 
plementation of the method. In addition, the complete execution 
of the design problem (for parallel performance calculations) was 
taken to be that of 2 complete design iterations, which include sev- 
eral flow and adjoint solutions. Finally, I/O time was also included 
in the performance calculations. As one can see, for a problem of 
moderate size such as this, excellent speed-ups are obtained using 
up to 16 processors. The speed-up for the 32 processor calculation 
shows a breaking trend due to the impossibility of achieving good 
load balancing by partitioning a 72 block mesh into 32 processors. 
The optimum solution for this problem shows a spread in the load 
balancing of almost 10%. This, together with the fact that the serial 
parts of the algorithm play a large role for problems with short exe- 
cution times (large numbers of processors), accounts for the decrease 
in parallel efficiency. 

Just as important as the speed-up curves of the method is the true 
wall-clock execution time. Two complete design iterations were 
completed in under 35 minutes using 32 processors. The rapid re- 
sponse of the design method clearly allows the possibility of investi- 
gating a large numberof configurations, and increases the probability 
of arriving at a truly optimum design. Timings for the 1 , 5(X), 000-cell 
HSCT mesh used in the Results sections can be readily extrapolated 
(rom the ratio of the numbers of cells in the two meshes. 

NUMERICAL TESTS AND RESULTS 

To demonstrate the utility of the design method, a supersonic trans- 
port configuration will be used as a testbed for the optimization 
algorithm. The baseline supersonic transport configuration depicted 
in Figures 4 — 6 was sized to accommodate 300 passengers with a 
gross take-off weight of 750,000 lbs. The supersonic cruise point 
is Mach 2.2 with a Ci of 0.105. As can be seen in Figure 6, the 
planform has a break in the leading edge sweep. The inboard leading 
edge sweep is 68.5 degrees while the outboard is 49.5 degrees. Since 
the Mach angle at M = 2.2 is 63 degrees it is clear that some leading 
edge bluntness may be used inboard without a significant wave drag 
penalty. Airfoils which use blunt leading edges were created that 
range from 4% thick at the root to 2.5% thick at the leading edge 



Figure 2: Parallel Speed-ups for the Adjoint Design 
Method 


break point. The symmetric initial airfoils were chosen with the pur- 
pose of accommodating thick spars at roughly 30% and 80% chord 
over the span up to the leading edge break. Outboard of the lead- 
ing edge break where the wing sweep is ahead of the Mach cone, a 
sharp leading edge was used to avoid undue wave drag. The airfoils 
were chosen to be symmetric, biconvex shapes modified to have a 
region of constant thickness over the mid-chord. Figure 5 shows that 
the four-engine configuration features axisymmetric nacelles tucked 
close to the wing lower surface. This layout favors reduced wave 
drag by minimizing the exposed diverter area. However, it may be 
problematic because of the channel flows occurring in the juncture 
region of the diverter, wing, and nacelle at the wing trailing edge. The 
leading edge heights of the diverters are determined by the boundary 
layer local displacement thickness such that entrainment of boundary 
layer flow into the engines is avoided. Since the distance from the 
wing leading edge to the diverter leading edge is different for the 
two nacelle, this causes a corresponding diverter height difference. 

In this design problem, we have chosen to attempt the straightfor- 
ward task of recambering the wing so as to minimize the total con- 
figuration wave drag while maintaining fixed Cl and Mach number. 
This represents a much simpler problem than allowing both thickness 
and camber to vary while maintaining spar constraints and adequate 
fuel volume. However, since the complex nacelle geometry and re- 
sulting meshes on the wing lower surface must be moved in unison 
with the wing motion, even a camber-only design represents a sig- 
nificant challenge for demonstration purposes. Further optimization 
benefits could also be attained not only by allowing thickness vari- 
ations in the wing optimization but also by performing both body 
optimization and nacelle orientation and position optimization. Such 
studies will be made in future work. 

Returning to the wing camber design at hand, the mesh system 
shown in Figure 4 was created with 180 blocks and 1,533,440 com- 
putational cells including halos. While the point-to-point topology 
forces a larger number of blocks to be used, this is actually benefi- 
cial from the standpoint of parallel load balancing. Ninety design 
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variables of the Hicks<Henne type are lofted in both the span wise 
and chordwise direction. These are spread over the entire wing such 
that complete freedom is allowed with the exception of the wing root 
intersection and the wing trailing edge. The former was held fixed 
since the capability of re intersecting different geometry elements 
has yet to be implemented. The latter was held fixed since it was 
determined that freedom in the trailing edge would likely produce 
waves in the span wise direction along the trailing edge which would 
violate possible flap constraints. Figure 6 shows the solution on the 
upper and lower surface for the baseline configuration at the cruise 
point. Tlie nacelles and diverters were forced to follow the varia- 
tions that were imposed on the wing. This allowed the nacelles to be 
included in the optimization without approximations. The rest of the 
complex mesh system in and around the nacelles shown in Figure 4 
was perturbed automatically as the design proceeded. 

Due to a shortage of both resources and time, only a two-design- 
cycle run of the optimization algorithm was achieved using 16 pro- 
cessors on an IBM SP2. This must be thought of as a very preliminary 
result. It nevertheless demonstrates the power of a parallel imple- 
mentation of the adjoint-based design method. The total wall clock 
time was 1 .75 hours for the entire run. Figure 7 shows the final solu- 
tion at the end of the second design iteration for both the upper and 
lower surfaces. It is clear that large differences have been made to the 
upper surface pressure field. The comparisons of the cross-sectional 
cuts of both the airfoils and the Cps shown in Figure 8 are more 
indicative of the design differences. The strong oblique shock that 
ran near the leading edge has been largely softened. Meanwhile, the 
lower surface regions around the nacelles have been tuned so as to 
cusp the region where the nacelle shocks impinge on the wing lower 
surface. This creates a forward facing region with high Cps, thus 
reducing wave drag. The final overall pressure drag was reduced 
by 3.5% or from Cd = 0.00882 to Od — 0.00851. It is confidently 
believed that much more could be gained through the investment of 
more computer resources. 

CONCLUSIONS 

In the period since this approach to optimal shape design was first 
proposed by the fourth author [19], the method has been verified by 
numerical implementation for both potential flow and flows modeled 
by the Euler equations [20, 39, 25, 23). It has been demonstrated 
that it can be successfully used with a finite volume formulation 
to perform calculations with arbitrary numerically generated grids 
[39, 25]. Further, results have been presented for three-dimensional 
calculations using both the analytic mapping and general finite vol- 
ume implementations [40]. In the last year the technique has been 
adopted by some industry participants to perform the aerodynamic 
design of future configurations [9]. With the parallel implementa- 
tion of the multiblock flow solver presented here, the technology has 
advanced to the degree that aerodynamic shape design of complete 
aircraft configurations with very rapid turnaround is possible. 

In this paper we have shown how the complicated design of super- 
sonic aircraft configurations including airframe/nacelle interaction 
effects can be accomplished in a routine fashion with the current 
method. For demonstration purposes a typical supersonic configu- 
ration with closely coupled nacelles and diverters was used as a test 
case. Nacelles of the axisymmetric type were employed and situated 
such that they almost abutted the wing lower surface. The diverters 


filling the space between the nacelles and the wing lower surface 
were constructed to eliminate the possibility of boundary layer en- 
trainment into the nacelle inlets. These geometry entities as well 
as the complete wing and body were fully modeled using the new 
multiblock analysis and design method. A preliminary camber-only 
design was carried out to optimize the cruise point performance for 
the complete aircraft configuration subject to wing thickness con- 
straints and fixed lift. All analysis and design cases were performed 
on parallel architecture machines in less than one day, demonstrat- 
ing that complete configuration designs may be achieved with rapid 
turn-around even with the most conservative estimates of available 
computational resources. In future efforts, the techniques will be 
extended to address both unstructured meshes and flows governed 
by the Navier-Stokes equations. 
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Block II Center 



Block I Center 


Figure 3: 4 Block interface using a double halo of cells around each block. 
Each block’s double halo of cells contains information from internal cells in 

adjacent blocks. 
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Generic Supersonic Transport Configuration 
^ SYN87-MB Grid Structure: 180 Blocks 
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Figure 4 
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Figure 6 
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Figure 7 



8a: span station 7/ = 0.116 


8b: span station ?/ = 0.374 



8c: span station 2 = 0.632 



Figure 8: SYN87-MB Fixed Lift Drag Minimization. 
180-Block Mesh, 1,500 K mesh cells, M = 2.2 
90 Camber Hicks-Henne variables. 

, Initial Pressure Coefficient 

— , Final Pressure Coefficient. 
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